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Abstract — A joint communication and channel state estimation 
problem is investigated, in which reliable information transmis- 
sion over a noisy channel, and high-fidelity estimation of the 
channel state, are simultaneously sought. The tradeoff between 
the achievable information rate and the estimation distortion is 
quantified by formulating the problem as a constrained channel 
coding problem, and the resulting capacity-distortion function 
characterizes the fundamental limit of the joint communication 
and channel estimation problem. The analytical results are 
illustrated through case studies, and further issues such as 
multiple cost constraints, channel uncertainty, and capacity per 
unit distortion are also briefly discussed. 

I. Introduction 

In this paper, we consider the problem of joint commu- 
nication and channel estimation over a channel with a time- 
varying channel state. We consider a noisy channel with a 
random channel state that evolves with time, in a memoryless 
fashion, and is neither available to the transmitter nor the 
receiver. The objective is to have the receiver recover both the 
information transmitted from the transmitter as well the state 
of the channel over which the information was transmitted. 
The problem setting may prove relevant for situations such as 
environment monitoring in sensor networks [1], underwater 
acoustic applications [2], and cognitive radio [3]. A distinct 
feature of our problem formulation is that both communication 
and channel estimation are required. 

The interplay between information measures and estimation 
(minimum mean-squared error (MMSE) in particular) has long 
been investigated; see, e.g., [4] and references therein. Previ- 
ously, however, estimation was only to facilitate information 
transmission, rather than a separate goal. For example, a 
common strategy in block interference channels [5] is channel 
estimation via training [6]. The purpose of channel training is 
only to increase the information rate for communication, and 
thus the quality of channel estimate is not traded off with the 
information rate, as we consider in this paper. 

The problem formulation in [7], [8] bears some similarity 
to the one we consider in that the receiver is interested in 
both communication and channel estimation. It differs from 
our work in a critical way: the channel state is assumed 
non-causally known at the transmitter. In contrast, neither the 
transmitter nor the receiver knows the channel state in our 
problem formulation. 

Intuitively, there exists a tradeoff between a channel's capa- 
bility to transfer information and its capability to exhibit state. 



Increasing randomness in channel inputs increases information 
transfer while reducing the receiver's ability to estimate the 
channel. In contrast, deterministic signaling facilitates channel 
estimation at the expense of zero information transfer. In this 
paper, we show that the optimal tradeoff can be formulated as 
a channel coding problem, with the channel input distribution 
constrained by an average "estimation cost" constraint. 

The rest of this paper is organized as follows. Section 
HI1 introduces the channel model and the capacity-distortion 
function, and Section iHll formulates the equivalent constrained 
channel coding problem. Section [IV] illustrates the application 
of the capacity-distortion function through several simple 
examples. Section [V] briefly discusses some related issues 
including multiple cost constraints, channel uncertainty, and 
capacity per unit distortion. Finally, Section [VTI concludes the 
paper. 

II. Channel Model 

We consider the channel model in Figure Q] For a length-n 
block of channel inputs, a message M is equally probably se- 
lected among {1, . . . , \e nR ~\ }, and is encoded by the encoder, 
generating the corresponding channel inputs {Xi,...,X n }. 
We provide the following definition. 

Definition 1: (Encoder) An encoder is defined by a func- 
tion, f n : M = {1, . . . , \e nR ] } -> X™, for each n e N. 
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Fig. 1. Channel model for joint communication and channel estimation. 

The channel is described by a transition function P(y\x,s), 
which is the probability distribution of the channel output Y, 
conditioned on the channel input X and the channel state S. 
Upon receiving the length-n block of channel outputs, the 
joint decoder and estimator (defined below) declares M £ 
{1, . . . , \e nR ~\ } as the decoded message, and a length-n block 
of estimates of the channel state. 

For technical purposes, in this paper, we assume that the 
random channel state evolves with time in a memoryless 



fashion. We note that this model encompasses the block 
interference channel model, because we can treat a block as 
a super-symbol and thus convert a block interference channel 
into a memoryless channel. 

Definition 2: (Joint decoder and estimator) A joint decoder 
and estimator is defined by a pair of functions, g n : y n — » M 
and h n : T -> § n , for each n e N. 

This definition differs from that of the conventional channel 
decoder {e.g., [9]) in that it explicitly requires estimation of 
the channel state S at the receiver. The quality of estimation 
is measured by the distortion function d : 8 x § — > R + U {0}. 
That is, if Sj is the ith element of h n (Y n ), then d(Sj,Si) 
denotes the distortion at time i, i = 1 , . . . , n. For technical 
convenience, we assume that d(-, •) is bounded from above so 
that there exists a finite T > with d(s, s') < T < oo for 
any s, s' G S. Note that for length-n block coding schemes, 
the average distortion is given by 

d(S",s") = -Vd(s ! ,s 1 ). (i) 

n * — ' 

i=l 

Finally, we have the following definitions. 

Definition 3: (Achievable rate) A nonnegative number 
R(D) is an achievable rate if there exist a sequence of 
encoders and corresponding joint decoders and estimators such 

(n) 

that (a) the average probability of decoding error 

(1/ \e nR W] ) • Em=i Pr[M ^ m\M = m) tends to zero 
as n — > oo; and (b) the average distortion in channel state 
estimation, 

IimsupEd(S",S n ) < D. (2) 

n — >oo 

Definition 4: (Capacity-distortion function) The capacity- 
distortion function is defined as 

C(D) = sup R(D). (3) 

U,g n ,K 

Remark: The reader may want to distinguish between the 
capacity-distortion function and the rate-distortion function in 
lossy source coding [9]. The capacity-distortion function is 
defined with respect to a state-dependent channel, seeking 
to characterize the fundamental tradeoff between the rate of 
information transmission and the distortion of state estimation. 
In contrast, the rate-distortion function is defined with respect 
to a source distribution, seeking to characterize the fundamen- 
tal tradeoff between the rate of its lossy description and the 
achievable distortion due to the description. 

III. A Constrained Channel Coding Formulation 

In this section, we show that the joint communication and 
channel estimation problem can be equivalently formulated as 
a constrained channel coding problem. For this purpose, the 
following minimum conditional distortion will be important. 
The minimum conditional distortion function is defined for 
each possible realization of the channel input X, as 

d*(x) = inf E[d(S,h (x,Y))}, (4) 



where the expectation is with respect to the channel state S 
and the channel output Y conditioned upon the channel input 
X = x, and ho : X x y — > § denotes an arbitrary one-shot 
estimator of S given the channel input and output. 

The following theorem establishes the constrained channel 
coding formulation. 

Theorem 1: The capacity-distortion function for the chan- 
nel model in Figure Q] is given by 

C(D)= sup 7(X;Y), (5) 

where 

V D = lp x :J2P x {x)d*(x)<D[. (6) 

I xex J 
Remark: Theorem Q] applies to general input/output/state 

alphabets. If X is a continuous random variable, the summation 

in §6^ should be understood as an integral over X. 

In order to prove Theorem [T] we shall employ the following 

lemmas. 

Lemma 1: For any (/„, g n , /i„)-sequence that achieves 
C(D), as n — > oo, the achieved average distortion (01 is (in 
probability) equal to the average distortion with S™ replaced 

by 

S™ = h* n (X n ,Y n ), (7) 

where h* n (X n , Y") denotes the block-n estimator that achieves 
the minimum average distortion conditioned upon both the 
block-n channel inputs and outputs. 

Proof. For each n, let us replace the estimator h n by /i* 
in ©, with its first argument being the channel inputs X n 
corresponding to the decoded message M. When M = M, the 
minimum average distortion is achieved by /i* ; when M ^ M, 
the increment in the average distortion due to replacing h n by 
h* n is bounded from above because d(-, ■) < T < oo. By 
Definitions [3] and |4] as n — > oo, the average probability of 

(n) 

decoding error Pr 1 -> 0. Hence as n — > oo, the minimum 
average distortion is achieved by h* n (X™, Y"), which is further 
equal to (0, in probability. Q.E.D. 

Lemma Q] shows that the joint decoder and estimator can 
utilize the reliably decoded channel inputs for channel state 
estimation. The next lemma, Lemma [2] further shows that the 
length-n block estimator can be decomposed into n one-shot 
estimators, each for one channel use. 

Lemma 2: For any (/„, g n , h n ) -sequence that achieves 
C(D), as n — > oo, the achieved average distortion (f2]l is (in 
probability) equal to that achieved by 

S i = h* (X i ,Y i ), i=l,...,n, (8) 

where h^(Xi : Y^) denotes the one-shot estimator that achieves 
the minimum expected distortion for Si conditioned upon both 
the channel input X^ and output Y;. 

Proof: From Lemma Q] as n — > oo, h n (Y n ) is in probability 
equivalent to /i* (X n , Y"). The decomposition ^ then follows 



because the channel is memoryless. For each fixed n, we have 

p/cn|Yii \/n\ — V ' i° ) 

1 ' ' ~ P(X",Y") 

nr=i [E Si wix lI s I )p(s l )] Px(Xi) 

As we take n — > oo, the lemma is established. Q.E.D. 

Proof of Theorem^ From Lemmas Q] and [2] we can rewrite 
the average distortion constraint (0 as 

1 " 
lim sup — 

z— 1 
i n 

=> limsup- VE^S^/iSCX^YO) <D. (10) 

n — >oo Tl . 

i—l 

Utilizing <j4j and the fact that the channel is memoryless, we 
can further deduce from (TlOb that 



E,d*(X) < D. 



(11) 



So now the constraints in Definition [3] reduce to having 
pj™^ — » as n — > oo, subject to the constraint ( fTTT l. This is 
exactly the problem of channel coding with a cost constraint 
on the input distribution, and Theorem [TJ directly follows from 
standard proofs; see, e.g., [10]. Q.E.D. 

Discussion: 

(1) The proof of Theorem [JJ suggests the joint decoder 
and estimator first decode the transmitted message in a "non- 
coherent" fashion, then utilize the reconstructed channel inputs 
along with the channel outputs to estimate the channel states. 
As the coding block length grows large, such a two-stage 
procedure becomes asymptotically optimal. 

(2) For each x £ X, d*(x) quantifies its associated min- 
imum distortion. Alternatively, d*{x) can be viewed as the 
"estimation cost" due to signaling with x. Hence the average 
distortion constraint in © regulates the input distribution such 
that the signaling is estimation-efficient. We emphasize that, 
d* (x) is dependent on the channel through the distribution of 
the channel state S, and thus differs from other usual costs 
such as symbol energies or time durations. 

(3) A key condition that leads to the constrained channel 
coding formulation is that the channel is memoryless. Due to 
the memoryless property, we can decompose a block estimator 
into multiple one-shot estimators, without loss of optimality 
asymptotically. If the channel state evolves with time in a 
correlated fashion, then such a decomposition is generally 
suboptimal. 

IV. Illustrative Examples 

In this section, we discuss several simple examples to 
illustrate the application of Theorem [JJ 



A. Uniform Estimation Costs 

A special case is that d*(x) = do for all x 6 X. For 
such type of channels, the average cost constraint in (0 
exhibits a singular behavior. If D < do, then the joint 
communication and channel estimation problem is infeasible; 
otherwise, Trj consists of all possible input distributions, and 
thus the capacity-distortion function C(D) is equal to the 
unconstrained capacity of the channel. One of the simplest 
channels with uniform estimation costs is the additive channel 
Yj = Xj + Si, for which as the receiver reliably decodes M, 
it can subtract off X^ from Yj. 

B. A Scalar Multiplicative Channel 

Consider the following scalar multiplicative channel 



(12) 



where all the alphabets are binary, X = y = S = {0, 1}, 
and the multiplication is in the conventional sense for real 
numbers. The reader may interpret S as the status of an 
informed jamming source, a fading level, or the status of 
another transmitter. Activating S to its "effective status" S = 
shuts down the link between X and Y; otherwise, the link 
X — > Y is essentially noiseless. We take the distortion measure 
as the Hamming distance: d(s, s) = 1 if and only if s ^ s and 
zero otherwise. 

The tradeoff between communication and channel estima- 
tion is straightforward to observe from the nature of the 
channel: for good estimation of S, we want X = 1 as often as 
possible, whereas this would reduce the achieved information 
rate. In this example, we assume that P(S = 1) = r < 1/2. 
We shall optimize P(X = 1), denoted by p 6 [0, 1]. The 
channel mutual information is /(X; Y) = Pi2(pr) — p ■ H2(r), 
where i?2(-) denotes the binary entropy function i?2(i) = 
— t \ogt — (1 — t) log(l — t). For x — 0, the optimal one-shot 
estimator is S = (note that P(S = 1) = r < 1/2), and the 
resulting minimum conditional distortion is <i*(0) = r. For 
x = 1, the optimal one-shot estimator is S = Y = S, leading 
to d*(l) = 0. Therefore the input distribution should satisfy 
(1 -p)r < D. 

After manipulations, we find that the optimal solution is 
given by 



If 



else 



D > r 



1 + e 



H 2 (r)/r 



1 



P 



1 + e 



H 2 {r)/r 



and C(D)=H 2 (p*r)-p* ■ H 2 (r); 

P =1 , 

r 

and C{D) = H 2 (r -D)-fl-j) //,(/ ). 



From the solution, we observe the following. For relatively 
large D, the average distortion constraint is not active, and 
thus the optimal input distribution coincides with that for the 
unconstrained channel capacity. As the estimation distortion 
constraint D falls below a threshold, the average distortion 
constraint becomes active, and the capacity-distortion function 



C(D) deviates from the unconstrained channel capacity. We 
can show from the expression of C(D) that, as D — > 0, 



c(D) = ^ Z rl D + o{D) 



(13) 



Figure [2] depicts C(D) versus D for different values of r. 
We notice that the tradeoff between communication rates and 
estimation distortions is evidently visible. 




Fig. 2. Capacity-distortion function for the scalar multiplicative channel. 



C. A Block Multiplicative Channel 

A generalization of the scalar multiplicative channel is the 
following block multiplicative channel 



(14) 



where X and Y are length-/v blocks so that the super-symbols 
in the block memoryless channel have alphabets X K = y A = 
{0, 1} K ■ The channel state S G S = {0,1} remains fixed 
for each block, and changes in a memoryless fashion across 
blocks. We again adopt the Hamming distance as the distortion 
measure. 

For such a channel, there are 2 K possible vectors for an 
input super-symbol. However, we note that, all of them except 
the all-zero x — are symmetric. This is because they all 
lead to the same conditional distribution for Y as well as the 
same minimum conditional distortion d*(x) = 0, Vx ^ 0. So 
from the concavity property of channel mutual information in 
input distributions, the optimal input distribution should take 
the following form: 

Px(0) = l-p, andPx(x) =:P /(2 A -l), Vx ± 0. 

We can find that the channel mutual information per channel 
use is 

I -^ L = ^ {H 2 (pr) +p. [r\og(2 K - 1) - H 2 (r)] } , (15) 
and that the average distortion constraint is 



(l-p)r<£>, 



the same as that in the scalar multiplicative channel case. After 
some manipulations, we find that the resulting optimal solution 
for general K > 1 is 



Case 1 2 A > 1 + (1 - r 
p* = 1, C(D) = 



-l/r 



rlog(2 A - 1) 



Case 2 2 K < 1 + (1 - r)- 1/r 



K 



if D > r - 



2 K - 1 



,H 2 (r)/r 



> 0, 



V 



1 



else p* 
C{D)- 



2 K - 
D 



p* [r log(2 A - 1) - H 2 (r 



Case 1 arises because if the channel block length K is 
sufficiently large such that 2 A > 1 + (1 — r) _1 / r , then the 
resulting p* as given by Case 2 would be greater than one, 
which is impossible for a valid probability. In Case 1, we have 
-Px(O) = 0, and all the nonzero symbols selected with equal 
probability 1/(2 A - 1). 

In fact, Case 1 kicks in for rather small values of K. In our 
channel model we have assumed r e [0,1/2]. For r smaller 
than 0.175, Case 1 arises for K > 2; and for r larger than 
0.175, Case 1 arises for K > 3. 

In the scalar multiplicative channel (K = 1), we have 
noticed that C(D) linearly scales to zero as D — > 0; see ( fT3l ). 
For K > 1, however, we have 

rlog(2 A - 1) 



C(0) = 



K 



> 0. 



(17) 



For comparison, let us consider a suboptimal approach based 
upon training that transmits X = 1 in the first channel use in 
each channel block. The receiver can thus perfectly estimate 
the channel state S and achieve D = 0. The encoder then can 
use the remaining (K — 1) channel uses in each channel block 
to encode information, and the resulting achievable rate is 

rlog(2 A - 1 ) 



R(0) 



K 



(18) 



(16) 



Comparing C(0) and R(0), we notice that their ratio ap- 
proaches one as K — > oo, consistent with the intuition that 
training usually leads to negligible rate loss for channels with 
long coherence blocks. 

V. Further Issues 

In this section, we briefly discuss a few issues that are 
related to the capacity-distortion function formulation. 

A. Multiple Estimators and Other Cost Constraints 

In certain applications, multiple cost constraints may be 
present. For example, the receiver may be simultaneously 
interested in two or more different distortion measures, or 
the transmitter may have an average energy constraint for the 
channel input, besides the average distortion constraint. The 



multiple cost constraints should be simultaneously satisfied by 
augmenting the feasible set of input distributions, Vrj (0, to 
the intersection of multiple feasible sets, each for one cost 
constraint. 

For either single or multiple cost constraints, the capacity- 
distortion function can be defined following Section [TT] for- 
mulated as a constrained channel coding problem following 
Section [TTTJ and computed following efficient algorithms like 
the Blahut-Arimoto algorithm [11], [12] for discrete alphabets. 

B. Uncertainty in Channel State Statistics 

The constrained channel coding formulation in Section [HI] 
can also be extended to the case in which the distribution 
of the channel state S is uncertain. For such a compound 
channel setting, we assume that the joint channel distribu- 
tion Pg(x,s,y) — P(y\x, s)Px(x)Ps,e(s) is parametrized by 
an unknown parameter 9 6 9, which is induced by the 
parametrized distribution of S, Ps.g(s). If all the alphabets 
X, y, and S are discrete, we can show following the proof 
in [13] that the capacity-distortion function of the compound 
channel is 



sup inf Ig(X: Y), 



(19) 



where 



y D = \Px--Y,Px(x)d* e (x)<D,Vd€e\ . (20) 

I xEX J 

In Ig(X;Y) and dg(x), the subscript 9 denotes that they are 
evaluated with respect to Pg(x, s, y). 

C. Capacity Per Unit Distortion 

In light of the definition of channel capacity per unit cost 
for general cost-constrained channels [14], we can analogously 
define the capacity per unit distortion, and show that it is equal 
to 

7(X;Y) 



C d 



Te[*(x)]- 



The capacity per unit distortion quantifies the maximum 
efficiency measured by the ratio between the amount of 
transmitted information and the incurred distortion in channel 
state estimation. 

From [14], if d*(x) — for at least two different input 
letters, then Cd — oo; if there exists a unique xq e X with 
d*(xo) = 0, then Cd is also given by 

D(P Y[x \\P Ylxo ) 



C d 



sup 



d*{x) 



(21) 



where -D(-||-) denotes the Kullback-Leibler divergence be- 
tween two distributions. Here, note that in Py\x we marginal- 
ize over the channel state S. 

Given (fJTJ, we can then conveniently evaluate Cd for 
various channels. For example, the scalar multiplicative chan- 
nel in Section IIV-BI has Cd = log ^ 1 ~ r ^ . In contrast, block 
multiplicative channels in Section IIV-CI with K > 2 have 
C d = oo, because all input letters except lead to (£*(•) = 0. 



VI. Conclusions 

In this paper, we introduce a joint communication and 
channel estimation problem for state-dependent channels, and 
characterize its fundamental tradeoff by formulating it as a 
channel coding problem with input distribution constrained 
by an average "estimation cost" constraint. The resulting 
capacity-distortion function permits a systematic investiga- 
tion of the channel property for communication and state 
estimation. Future research topics include specializing the 
general framework to particular channel models in realistic 
applications, and generalizing the results to multiuser systems 
and channels of generally correlated state processes. 
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